IEEE INFOCOM 2024

Session E-8

E-8: Machine Learning 2

Conference
8:30 AM — 10:00 AM PDT
Local
May 23 Thu, 11:30 AM — 1:00 PM EDT
Location
Regency E

Deep Learning Models As Moving Targets To Counter Modulation Classification Attacks

Naureen Hoque and Hanif Rahbari (Rochester Institute of Technology, USA)

0
Malicious entities use advanced modulation classification (MC) techniques to launch traffic analysis, selective jamming, evasion, and poison attacks. Recent studies show that current defense mechanisms against such attacks are static in nature and vulnerable to persistent adversaries who invest time and resources into learning the defenses, thereby being able to design and execute more sophisticated attacks to circumvent them. In this paper, we present a moving-target defense framework to support a novel modulation-masking mechanism we develop against advanced and persistent modulation classification attacks. The modulated symbols are masked using small perturbations before transmission to make it appear as if from another modulation scheme. By deploying a pool of deep learning models and perturbation generating techniques, the defense strategy keeps changing (moving) them when needed, making it difficult for adversaries to keep up with the defense system's changes over time. We show that the overall system performance remains unaffected under our technique. We further demonstrate that our masking technique, in addition to other existing defenses, can be learned and circumvented over time by a persistent adversary unless a moving target defense approach is adopted.
Speaker
Speaker biography is not available.

Deep Learning-based Modulation Classification of Practical OFDM signals for Spectrum Sensing

Byungjun Kim (UCSD, USA); Peter Gerstoft (University of California, San Diego, USA); Christoph F Mecklenbräuker (TU Wien, Austria)

0
In this study, the modulation of symbols on OFDM subcarriers is classified for transmissions following Wi-Fi 6 and 5G downlink specifications. First, our approach estimates the OFDM symbol duration and cyclic prefix length based on the cyclic autocorrelation function. We propose a feature extraction algorithm characterizing the modulation of OFDM signals, which includes removing the effects of a synchronization error. The obtained feature is converted into a 2D histogram of phase and amplitude and this histogram is taken as input to a convolutional neural network (CNN)-based classifier. The classifier does not require prior knowledge of protocol-specific information such as Wi-Fi preamble or resource allocation of 5G physical channels. The classifier's performance, evaluated using synthetic and real-world measured over-the-air (OTA) datasets, achieved a minimum accuracy of 97% accuracy with OTA data when SNR is above the value required for data transmission.
Speaker
Speaker biography is not available.

Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems

Chetna Singhal (Indian Institute of Technology Kharagpur, India); Yashuo Wu (University of California Irvine, USA); Francesco Malandrino (CNR-IEIIT, Italy); Marco Levorato (University of California, Irvine, USA); Carla Fabiana Chiasserini (Politecnico di Torino & CNIT, IEIIT-CNR, Italy)

0
The increasing pervasiveness of intelligent mobile applications requires to exploit the full range of resources offered by the mobile-edge-cloud network for the execution of inference tasks. However, due to the heterogeneity of such multi-tiered networks, it is essential to make the applications' demand amenable to the available resources while minimizing energy consumption. Modern dynamic deep neural networks (DNN) achieve this goal by designing multi-branched architectures where early exits enable sample-based adaptation of the model depth. In this paper, we tackle the problem of allocating sections of DNNs with early exits to the nodes of the mobile-edge-cloud system. By envisioning a 3-stage graph-modeling approach, we represent the possible options for splitting the DNN and deploying the DNN blocks on the multi-tiered network, embedding both the system constraints and the application requirements in a convenient and efficient way. Our framework - named Feasible Inference Graph (FIN) - can identify the solution that minimizes the overall inference energy consumption while enabling distributed inference over the multi-tiered network with the target quality and latency. Our results, obtained for DNNs with different levels of complexity, show that FIN matches the optimum and yields over 65% energy savings relative to a state-of-the-art technique for cost minimization.
Speaker
Speaker biography is not available.

Jewel: Resource-Efficient Joint Packet and Flow Level Inference in Programmable Switches

Aristide Tanyi-Jong Akem (IMDEA Networks Institute, Spain & Universidad Carlos III de Madrid, Spain); Beyza Butun (Universidad Carlos III de Madrid & IMDEA Networks Institute, Spain); Michele Gucciardo and Marco Fiore (IMDEA Networks Institute, Spain)

0
Embedding machine learning (ML) models in programmable switches realizes the vision of high-throughput and low-latency inference at line rate. Recent works have made breakthroughs in embedding Random Forest (RF) models in switches for either packet-level inference or flow-level inference. The former relies on simple features from packet headers that are simple to implement but limit accuracy in challenging use cases; the latter exploits richer flow features to improve accuracy, but leaves early packets in each flow unclassified. We propose Jewel, an in-switch ML model based on a fully joint packet-and flow-level design, which takes the best of both worlds by classifying early flow packets individually and shifting to flow-level inference when possible. Our proposal involves (i) a single RF model trained to classify both packets and flows, and (ii) hardware-aware model selection and training techniques for resource footprint minimization. We implement Jewel in P4 and deploy it in a testbed with Intel Tofino switches, where we run extensive experiments with a variety of real-world use cases. Results reveal how our solution outperforms four state-of-the-art benchmarks, with accuracy gains in the 2.2%-5.3% range.
Speaker
Speaker biography is not available.

Session Chair

Marilia Curado (University of Coimbra, Portugal)

Enter Zoom
Session E-9

E-9: Machine Learning 3

Conference
10:30 AM — 12:00 PM PDT
Local
May 23 Thu, 1:30 PM — 3:00 PM EDT
Location
Regency E

Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

Xinglin Pan (Hong Kong Baptist University, Hong Kong); Wenxiang Lin and Shaohuai Shi (Harbin Institute of Technology, Shenzhen, China); Xiaowen Chu (The Hong Kong University of Science and Technology (Guangzhou) & The Hong Kong University of Science and Technology, Hong Kong); Weinong Sun (The Hong Kong University of Science and Technology, Hong Kong); Bo Li (Hong Kong University of Science and Technology, Hong Kong)

0
Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, the training efficiency is hindered by communication costs introduced by these parallel paradigms. To address this limitation, we propose Parm, a system that accelerates MP+EP+ESP training by designing two dedicated schedules for placing communication tasks. The proposed schedules eliminate redundant computations and communications and enable overlaps between intra-node and inter-node communications, ultimately reducing the overall training time. As the two schedules are not mutually exclusive, we provide comprehensive theoretical analyses and derive an automatic and accurate solution to determine which schedule should be applied in different scenarios. Experimental results on an 8-GPU server and a 32-GPU cluster demonstrate that Parm outperforms the state-of-the-art MoE training system, DeepSpeed-MoE, achieving 1.13x-5.77x speedup on 1296 manually configured MoE layers and approximately 3x improvement on two real-world MoE models based on BERT and GPT-2.
Speaker
Speaker biography is not available.

Predicting Multi-Scale Information Diffusion via Minimal Substitution Neural Networks

Ranran Wang (University of Electronic Science and Technology of China, China); Yin Zhang (University of Electronic Science and Technology, China); Wenchao Wan and Xiong Li (University of Electronic Science and Technology of China, China); Min Chen (Huazhong University of Science and Technology, China)

0
Information diffusion prediction is a complex task due to the numerous variables present in large social platforms like Weibo and Twitter. While many researchers have focused on the internal influence of individual cascades, they often overlook other influential factors that affect information diffusion. These factors include competition and cooperation among information, the attractiveness of information to users, and the potential impact of content anticipation on further diffusion. Traditional methods relying on individual information modeling struggle to consider these aspects comprehensively. To address the above issues, we propose a multi-scale information diffusion prediction method with a minimal substitution neural network, called MIDPMS. Specifically, to simultaneously enable macro-scale popularity prediction and micro-scale diffusion prediction, we model information diffusion as a substitution process among different information sources. Furthermore, considering the life cycle of content, user preferences, and potential content anticipation, we introduce minimal substitution theory and design a minimal substitution neural network to model this substitution system and facilitate joint training of macroscopic and microscopic diffusion prediction. The extensive experiments on Weibo and Twitter datasets demonstrate MIDPMS significantly outperforms the state-of-art methods over two datasets on both multi-scale tasks.
Speaker
Speaker biography is not available.

Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference

Huaiguang Cai (Sun Yat-Sen University, China); Zhi Zhou (Sun Yat-sen University, China); Qianyi Huang (Sun Yat-Sen University, China & Peng Cheng Laboratory, China)

0
Due to several kinds of drift, the traditional computing paradigm of deploying a trained model and then performing inference has not been able to meet the accuracy requirements. Accordingly, a new computing paradigm that, retraining the model and performing inference simultaneously on new data after the model is deployed, emerged (we call it model inference and retraining co-location). The key challenge is how to allocate computing resources for model retraining and inference to improve long-term accuracy, especially when computing resources are changing dynamically.
We address this challenge by modeling the relationship between model performance and different retraining and inference configurations first and then propose a linear complexity online algorithm (named \ouralg).
\ouralg solves the original non-convex, integer, time-coupled problem approximately by adjusting the proportion between model retraining and inference according to available real-time computing resources. The competitive ratio of \ouralg is strictly better than the tight competitive ratio of the Inference-Only algorithm (corresponding to the traditional computing paradigm) when data drift occurs for a sufficiently lengthy time, implying the advantages and applications of model inference and retraining co-location paradigm. In particular, \ouralg translates to several heuristic algorithms in different environments. Experiments based on real scenarios confirm the effectiveness of \ouralg.
Speaker
Speaker biography is not available.

Tomtit: Hierarchical Federated Fine-Tuning of Giant Models based on Autonomous Synchronization

Tianyu Qi and Yufeng Zhan (Beijing Institute of Technology, China); Peng Li (The University of Aizu, Japan); Yuanqing Xia (Beijing Institute of Technology, China)

0
With the quick evolution of giant models, the paradigm of pre-training models and then fine-tuning them for downstream tasks has become increasingly popular. The adapter has been recognized as an efficient fine-tuning technique and attracts much research attention. However, adapter-based fine-tuning still faces the challenge of lacking sufficient data. Federated fine-tuning has been recently proposed to fill this gap, but existing solutions suffer from a serious scalability issue, and they are inflexible in handling dynamic edge environments. In this paper, we propose Tomtit, a hierarchical federated fine-tuning system that can significantly accelerate fine-tuning and improve the energy efficiency of devices. Via extensive empirical study, we find that model synchronization schemes (i.e., when edges and devices should synchronize their models) play a critical role in federated fine-tuning. The core of Tomtit is a distributed design that allows each edge and device to have a unique synchronization scheme with respect to their heterogeneity in model structure, data distribution and computing capability. Furthermore, we provide a theoretical guarantee about the convergence of Tomtit. Finally, we develop a prototype of Tomtit and evaluate it on a testbed. Experimental results show that it can significantly outperform the state-of-the-art.
Speaker
Speaker biography is not available.

Session Chair

Marco Fiore (IMDEA Networks Institute, Spain)

Enter Zoom
Session E-10

E-10: Machine Learning 4

Conference
1:30 PM — 3:00 PM PDT
Local
May 23 Thu, 4:30 PM — 6:00 PM EDT
Location
Regency E

Augment Online Linear Optimization with Arbitrarily Bad Machine-Learned Predictions

Dacheng Wen (The University of Hong Kong, Hong Kong); Yupeng Li (Hong Kong Baptist University, Hong Kong); Francis C.M. Lau (The University of Hong Kong, Hong Kong)

0
The online linear optimization paradigm is important to many real-world network applications as well as theoretical algorithmic studies. Recent studies have made attempts to augment online linear optimization with machine-learned predictions of the cost function that are meant to improve the performance of the learner. However, they fail to address the possible realistic case where the predictions can be arbitrarily bad. In this work, we take the first step to study the problem of online linear optimization with a dynamic number of arbitrarily bad machine-learned predictions per round and propose an algorithm termed OLOAP. Our theoretical analysis shows that, when the qualities of the predictions are satisfactory, OLOAP achieves a regret bound of O(log T), which circumvents the tight lower bound of Ω( √ T) for the vanilla problem of online linear optimization (i.e., the one without any predictions). Meanwhile, the regret of our algorithm is never worse than O( √ T) irrespective of the qualities of predictions. In addition, we further derive a lower bound for the regret of the studied problem, which demonstrates that OLOAP is near-optimal. We consider two important network applications and conduct extensive evaluations. Our results validate the superiority of our algorithm over state-of-the-art approaches.
Speaker
Speaker biography is not available.

Dancing with Shackles, Meet the Challenge of Industrial Adaptive Streaming via Offline Reinforcement Learning

Lianchen Jia (Tsinghua University, China); Chao Zhou (Beijing Kuaishou Technology Co., Ltd, China); Tianchi Huang, Chaoyang Li and Lifeng Sun (Tsinghua University, China)

0
Adaptive video streaming has been studied for over 10 years and has demonstrated remarkable performance. However, adaptive video streaming is not an independent algorithm but relies on other components of the video system. Consequently, as other components undergo optimization, the gap between the traditional simulator and the real-world system continues to grow which makes the adaptive video streaming algorithm must adapt to these variations. In order to address the challenges facing industrial adaptive video streaming, we introduce a novel offline reinforcement learning framework called Backwave. This framework leverages history logs to reduce the sim-real gap. We propose new metrics based on counterfactual reasoning to evaluate its performance and we integrate expert knowledge to generate valuable data to mitigate the issue of data override. Furthermore, we employ curriculum learning to minimize additional errors. We deployed Backwave on a mainstream commercial short video platform, Kuaishou. In a series of A/B tests conducted nearly one month with over 400M daily watch times, Backwave consistently outperforms prior algorithms. Specifically, Backwave reduces stall time by 0.45% to 8.52% while maintaining comparable video quality and Backwave demonstrates improvements in average play duration by 0.12% to 0.16%, and overall play duration by 0.12% to 0.26%.
Speaker
Speaker biography is not available.

GraphProxy: Communication-Efficient Federated Graph Learning with Adaptive Proxy

Junyang Wang, Lan Zhang, Junhao Wang, Mu Yuan and Yihang Cheng (University of Science and Technology of China, China); Qian Xu (BestPay Co.,Ltd,China Telecom, China); Bo Yu (Bestpay Co., Ltd, China Telecom, China)

0
Federated graph learning (FGL) enables multiple participants with distributed but connected graph data to collaboratively train a model in a privacy-preserving way. However, the high communication cost hinder the adoption of FGL in many resource-limited or delay-sensitive applications. In this work, we focus on reducing the communication cost incurred by the transmission of neighborhood information in FGL. We propose to search for local proxies that can play a substitute role as the external neighbors, and develop a novel federated graph learning framework named GraphProxy. GraphProxy utilizes representation similarity and class correlation to select local proxies for external neighbors. And we propose to dynamically adjust the proxy strategy according to the changing representation of nodes during the iterative training process. We also perform a theoretical analysis and show that using a proxy node has a similar influence on training when it is sufficiently similar to the external one. Extensive evaluations show the effectiveness of our design, e.g., GraphProxy can achieve 8 times communication efficiency with only 0.14% performance degradation.
Speaker
Speaker biography is not available.

Learning Context-Aware Probabilistic Maximum Coverage Bandits: A Variance-Adaptive Approach

Xutong Liu (The Chinese University of Hong Kong, Hong Kong); Jinhang Zuo (University of Massachusetts Amherst & California Institute of Technology, USA); Junkai Wang (Fudan University, China); Zhiyong Wang (The Chinese University of Hong Kong, Hong Kong); Yuedong Xu (Fudan University, China); John Chi Shing Lui (Chinese University of Hong Kong, Hong Kong)

0
Probabilistic maximum coverage (PMC) is an important framework that can model many network applications, including mobile crowdsensing, content delivery, and task replication. In PMC, an operator chooses nodes in a graph that can probabilistically cover other nodes, aiming to maximize the total rewards from the covered nodes. To tackle the challenge of unknown parameters in network environments, PMC are studied under the online learning context, i.e., the PMC bandit. However, existing PMC bandits lack context-awareness and fail to exploit valuable contextual information, limiting their efficiency and adaptability in dynamic environments. To address this limitation, we propose a novel context-aware PMC bandit model (C-PMC). C-PMC employs a linear structure to model the mean outcome of each arm, efficiently incorporating contextual features and enhancing its applicability to large-scale network systems. Then we design a variance-adaptive contextual combinatorial upper confidence bound algorithm (VAC2UCB), which utilizes second-order statistics, specifically variance, to re-weight feedback data and estimate unknown parameters. Our theoretical analysis shows that C-PMC achieves a regret of \(\tilde{O}(d\sqrt{VT})\), independent of the number of edges E and action size K. Finally, we conduct experiments on synthetic and real-world datasets, showing the superior performance of VAC2UCB in context-aware mobile crowdsensing and user-targeted content delivery applications.
Speaker
Speaker biography is not available.

Session Chair

Walter Willinger (NIKSUN, USA)

Enter Zoom
Session E-11

E-11: Machine Learning 5

Conference
3:30 PM — 5:00 PM PDT
Local
May 23 Thu, 6:30 PM — 8:00 PM EDT
Location
Regency E

Taming Subnet-Drift in D2D-Enabled Fog Learning: A Hierarchical Gradient Tracking Approach

Evan Chen (Purdue University, USA); Shiqiang Wang (IBM T. J. Watson Research Center, USA); Christopher G. Brinton (Purdue University, USA)

0
Federated learning (FL) encounters scalability challenges when implemented over fog networks. Semi-decentralized FL (SD-FL) proposes a solution that divides model cooperation into two stages: at the lower stage, device-to-device (D2D) communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Analytical characterization of SD-GT reveals convergence upper bounds for both non-convex and strongly-convex problems, for a suitable choice of step size. We employ the resulting bounds in the development of a co-optimization algorithm for optimizing subnet sampling rates and D2D rounds according to a performance-efficiency trade-off. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.
Speaker
Speaker biography is not available.

Towards Efficient Asynchronous Federated Learning in Heterogeneous Edge Environments

Yajie Zhou (Zhejiang University, China); Xiaoyi Pang (Wuhan University, China); Zhibo Wang and Jiahui Hu (Zhejiang University, China); Peng Sun (Hunan University, China); Kui Ren (Zhejiang University, China)

0
Federated learning (FL) is widely used in edge environments as a privacy-preserving collaborative learning paradigm. However, edge devices often have heterogeneous computation capabilities and data distributions, hampering the efficiency of co-training. Existing works develop staleness-aware semi-asynchronous FL that reduces the contribution of slow devices to the global model to mitigate their negative impacts. But this makes data on slow devices unable to be fully leveraged in global model updating, exacerbating the effects of data heterogeneity. In this paper, to cope with both system and data heterogeneity, we propose a clustering and two-stage aggregation-based Efficient Asynchronous Federated Learning (EAFL) framework, which can achieve better learning performance with higher efficiency in heterogeneous edge environments. In EAFL, we first propose a gradient similarity-based dynamic clustering mechanism to cluster devices with similar system and data characteristics together dynamically during the training process. Then, we develop a novel two-stage aggregation strategy consisting of staleness-aware semi-asynchronous intra-cluster aggregation and data size-aware synchronous inter-cluster aggregation to efficiently and comprehensively aggregate training updates across heterogeneous clusters. With that, the negative impacts of slow devices and Non-IID data can be simultaneously alleviated, thus achieving efficient collaborative learning. Extensive experiments demonstrate that EAFL is superior to state-of-the-art methods.
Speaker
Speaker biography is not available.

Personalized Prediction of Bounded-Rational Bargaining Behavior in Network Resource Sharing

Haoran Yu and Fan Li (Beijing Institute of Technology, China)

0
There have been many studies leveraging bargaining to incentivize the sharing of network resources between resource owners and seekers. They predicted bargaining behavior and outcomes mainly by assuming that bargainers are fully rational and possess sufficient knowledge about their opponents. Our work addresses the prediction of bargaining behavior in network resource sharing scenarios where these assumptions do not hold, i.e., bargainers are bounded-rational and have heterogeneous knowledge. Our first key idea is using a multi-output Long Short-Term Memory (LSTM) neural network to learn bargainers' behavior patterns and predict both their discrete and continuous decisions. Our second key idea is assigning a unique latent vector to each bargainer, characterizing the heterogeneity among bargainers. We propose a scheme to jointly learn the LSTM weights and latent vectors from real bargaining data, and utilize them to achieve a personalized behavior prediction. We prove that estimating our LSTM weights corresponds to a special design of LSTM training, and also theoretically characterize the performance of our scheme. To deal with large-scale datasets in practice, we further propose a variant of our scheme to accelerate the LSTM training. Experiments on a large real-world bargaining dataset demonstrate that our schemes achieve more accurate personalized predictions than baselines.
Speaker
Speaker biography is not available.

PPGSpotter: Personalized Free Weight Training Monitoring Using Wearable PPG Sensor

Xiaochen Liu, Fan Li, Yetong Cao, Shengchun Zhai and Song Yang (Beijing Institute of Technology, China); Yu Wang (Temple University, USA)

0
Free weight training (FWT) is of utmost importance for physical well-being. However, the success of FWT depends on choosing the suitable workload, as improper selections can lead to suboptimal outcomes or injury. Current workload estimation approaches rely on manual recording and specialized equipment, with limited feedback. Therefore, we introduce PPGSpotter, a novel PPG-based system for FWT monitoring in a convenient, low-cost, and fine-grained manner. By characterizing the arterial geometry compressions caused by the deformation of distinct muscle groups during various exercises and workloads in PPG signals, PPGSpotter can infer essential FWT factors such as workload, repetitions, and exercise type. To remove pulse-related interference that heavily contaminates PPG signals, we develop an arterial interference elimination approach based on adaptive filtering, effectively extracting the pure motion-derived signal (MDS). Furthermore, we explore 2D representations within the phase space of MDS to extract spatiotemporal information, enabling PPGSpotter to address the challenge of resisting sensor shifts. Finally, we leverage a multi-task CNN-based model with workload adjustment guidance to achieve personalized FWT monitoring. Extensive experiments with 15 participants confirm that PPGSpotter can achieve workload estimation (0.59 kg RMSE), repetitions estimation (0.96 reps RMSE), and exercise type recognition (91.57% F1-score) while providing valid workload adjustment recommendations.
Speaker Xiaochen Liu (Beijing Institute of Technology, China)



Session Chair

Yuval Shavitt (Tel-Aviv University, Israel)

Enter Zoom


Gold Sponsor


Gold Sponsor


Student Travel Grants


Student Travel Grants


Student Travel Grants

Made with in Toronto · Privacy Policy · INFOCOM 2020 · INFOCOM 2021 · INFOCOM 2022 · INFOCOM 2023 · © 2024 Duetone Corp.